Biostatistics For Dummies (Monika Wahi John Pezzullo)

© John Wiley & Sons, Inc.

FIGURE 12-6: A general way of naming the cells of a cross-tab table.

Using these conventions, the basic formulas for the Pearson chi-square test are as follows:

Expected values:

Chi-square statistic:

Degrees of freedom:

where i and j are array indices that indicate the row and column, respectively, of each cell.

Pointing out the pros and cons of the chi-square test

The Pearson chi-square test is very popular for several reasons:

It’s easy! The calculations are simple to do manually in Microsoft Excel (although this is not

recommended because the risk of making a typing mistake is high). As described earlier, statistical

software packages like the ones discussed in Chapter 4 can perform the chi-square test for both

individual-level data as well as summarized cross-tabulated data. Also, several websites can

perform the test, and the test has been implemented on smartphones and tablets.

It’s flexible! The test works for tables with any number of rows and columns, and it easily handles

cell counts of any magnitude. Statistical software can usually complete the calculations quickly,

even on big data sets.

But the chi-square test has some shortcomings:

It’s not an exact test. The p value it produces is only approximate, so using

as your

criterion for statistical significance (meaning setting α = 0.05) doesn’t necessarily guarantee that

your Type I error rate will be only 5 percent. Remember, your Type I error rate is the likelihood

you will claim statistical significance on a difference that is not true (see Chapter 3 for an

introduction to Type I errors). The level of accuracy of the statistical significance is high when all